REFRACTIVE: An Open Source Tool to Extract Knowledge from Syntactic and Semantic Relations

نویسندگان

  • Peter Exner
  • Pierre Nugues
چکیده

The extraction of semantic propositions has proven instrumental in applications like IBM Watson (Ferrucci, 2012) and in Google’s knowledge graph (Singhal, 2012). One of the core components of IBM Watson is the PRISMATIC knowledge base consisting of one billion propositions extracted from the English version of Wikipedia and the New York Times (Fan et al., 2010). However, extracting the propositions from the English version of Wikipedia is a time-consuming process. In practice, this task requires multiple machines and a computation distribution involving a good deal of system technicalities. In this paper, we describe REFRACTIVE, an open-source tool to extract propositions from a parsed corpus based on the Hadoop variant of MapReduce. While the complete process consists of a parsing part and an extraction part, we focus here on the extraction from the parsed corpus and we hope this tool will help computational linguists speed up the development of applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

A text categorisation tool for open source communities based on semantic analysis

Open source software (OSS) projects are supported by communities interacting through software repositories and mailing lists. Thousands of contributors participate in the development of the projects although they rarely meet each other. The result is a huge archived repository with thousands of questions, answers and contributions usually difficult to explore. We propose a tool based on semanti...

متن کامل

A Language Model for Extracting Implicit Relations

Open Information Extraction has shown promise of overcoming a knowledge engineering bottleneck, but has a fundamental limitation. It is unable to extract implicit relations, where the sentence lacks an explicit relation phrase. We present IMPLIE (Implicit relation Information Extraction) that uses an open-domain syntactic language model and user-supplied semantic taggers to overcome this limita...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014